Mining of Frequent Block Preserving Outerplanar Graph Structured Patterns
نویسندگان
چکیده
An outerplanar graph is a planar graph which can be embedded in the plane in such a way that all of vertices lie on the outer boundary. Many semi-structured data like the NCI dataset having about 250,000 chemical compounds can be expressed by outerplanar graphs. In this paper, we consider a data mining problem of extracting structural features from semi-structured data like the NCI dataset. For this data mining problem, first of all, we define a new graph pattern, called a block preserving outerplanar graph pattern, as an outerplanar graph having structured variables. Then, we present an effective Apriori-like algorithm for enumerating frequent block preserving outerplanar graph patterns from semi-structured data in incremental polynomial time. Lastly, by reporting some preliminary experimental results on a subset of the NCI dataset, we evaluate the performance of our algorithms.
منابع مشابه
Mining frequent subgraphs from ’easy’ classes
Recently, there is an increasing interest in mining structured data. Several frequent subgraph mining systems have been proposed. However, these usually consider general graphs. One can show that frequent subgraph mining for general graphs can not be performed in output-polynomial time. In practice however, data usually does not consist of arbitrary graphs but has a much simpler structure. In t...
متن کاملLWA 2006 Proceedings
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of...
متن کاملAn Efficiently Computable Graph-Based Metric for the Classification of Small Molecules
In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NP-complete. Representing the molecules as outerplanar graphs, a subclass within general graphs, and using the block-and-bridge preserving subgraph i...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملFrequent Pattern Mining from Dense Graph Streams
As technology advances, streams of data can be produced in many applications such as social networks, sensor networks, bioinformatics, and chemical informatics. These kinds of streaming data share a property in common—namely, they can be modeled in terms of graph-structured data. Here, the data streams generated by graph data sources in these applications are graph streams. To extract implicit,...
متن کامل